STO: A Danish Lexicon Resource - Ready for Applications
نویسندگان
چکیده
$EVWUDFWW This paper deals with the STO lexicon, the most comprehensive computational lexicon of Danish developed for NLP/HLT applications, which is now ready for use. Danish was one of the 12 EU-languages participating in the LE-PAROLE and SIMPLE projects; therefore it was obvious to continue this work building on our experience obtained from these projects. The material for Danish produced within these projects – further enriched with language-specific information-is incorporated into the STO lexicon. First, we describe the main characteristics of the lexical coverage and linguistic content of the STO lexicon; second, we present some recent uses and point to some prospective exploitations of the material. Finally, we outline an internet-based user interface, which allows for browsing through the complex information content of the STO lexical database and some other selected WRL's for Danish. 3URMHFW2EMHFWLYHVDQGG%DFNJURXQGG The objective of the Danish STO project (SprogTeknologisk Ordbase, i.e. Lexical Database for Language Technology) was twofold: first, to develop a flexible, large-scale lexical resource in order to remedy a general bottleneck problem for Danish NLP applications; second, to strengthen the position of Danish, as a member of the still growing multilingual NLP/HLT community. The project background as well as various development aspects and stages were presented at previous LREC 1. STO was a national collaborational project, initiated by CST and founded on a contract with the Danish Ministry for Science, Technology and Development. The duration of the project was three years, ending February 2004. STO is well integrated with the European activities in the field of computational lexicon development for the following reasons. Danish was one of the 12 EU-languages that were part of the PAROLE (LE2-4017, 1996-98) and SIMPLE (LE4-8346, 1999-2000) projects. The LE-PAROLE/SIMPLE 2 models and descriptive methods obtained a status of being 'de facto standards' in the development of computational lexicons (Lenci et al, 2000). It was obvious to continue this work and build on our experience gathered from these multilingual projects, the more so as other national projects, e.g. the Italian CLIPS (Ruimy et al, 2002) were started on the same basis. Even though in STO a number of language-specific refinements, various adaptations and extensions are implemented, its model and descriptive method is kept compatible with the architecture and descriptive language shared by the lexicons developed within the PAROLE/SIMPLE framework. Therefore the STO lexicon can be linked to other lexicons that share the same features and can be exploited in …
منابع مشابه
Current Developments of STO - the Danish Lexicon Project for NLP and HLT Applications
The Centre for Language Technology (Center for Sprogteknologi, CST) is in charge of a national project developing a large-scale Danish lexicon for HLT and NLP applications. The short name of the project is STO, which stands for SprogTegnologisk Ordbase (Lexical Database for Language Technology). The project is inspired by principles and methods applied in the multilingual LEPAROLE project (1996...
متن کاملA Corpus-based Syntactic Lexicon for Adverbs
A word class often neglected in the field of NLP resources, namely adverbs, has lately been described in a computational lexicon produced at CST as one of the results of a Ph.D.-project. The adverb lexicon, which is integrated in the Danish STO lexicon, gives detailed syntactic information on the type of modification and position, as well as on other syntactic properties of approx 800 Danish ad...
متن کاملMerging a Syntactic Resource with a WordNet: a Feasibility Study of a Merge between STO and DanNet
This paper presents a feasibility study of a merge between SprogTeknologisk Ordbase (STO), which contains morphological and syntactic information, and DanNet, which is a Danish WordNet containing semantic information in terms of synonym sets and semantic relations. The aim of the merge is to develop a richer, composite resource which we believe will have a broader usage perspective than the two...
متن کاملLemma selection in domain specific computational lexica - some specific problems
This paper describes the lemma selection process of a Danish computational lexicon, the STO project, for domain specific language and focuses on some specific problems encountered during the lemma selection process. After a short introduction to the STO project and an explanation of why the lemmas are selected from a corpus and not chosen from existing dictionaries, the lemma selection process ...
متن کاملUsing Danish as a CG Interlingua: A Wide-Coverage Norwegian-English Machine Translation System
This paper presents a rule-based Norwegian-English MT system. Exploiting the closeness of Norwegian and Danish, and the existence of a well-performing Danish-English system, Danish is used as an «interlingua». Structural analysis and polysemy resolution are based on Constraint Grammar (CG) function tags and dependency structures. We describe the semiautomatic construction of the necessary Norwe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004